Search CORE

34 research outputs found

Polyglot: Distributed Word Representations for Multilingual NLP

Author: Al-Rfou Rami
Perozzi Bryan
Skiena Steven
Publication venue
Publication date: 27/06/2014
Field of study

Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'201

arXiv.org e-Print Archive

CiteSeerX

The Expressive Power of Word Embeddings

Author: Al-Rfou Rami
Chen Yanqing
Perozzi Bryan
Skiena Steven
Publication venue
Publication date: 29/05/2013
Field of study

We seek to better understand the difference in quality of the several publicly released embeddings. We propose several tasks that help to distinguish the characteristics of different embeddings. Our evaluation of sentiment polarity and synonym/antonym relations shows that embeddings are able to capture surprisingly nuanced semantics even in the absence of sentence structure. Moreover, benchmarking the embeddings shows great variance in quality and characteristics of the semantics captured by the tested embeddings. Finally, we show the impact of varying the number of dimensions and the resolution of each dimension on the effective useful features captured by the embedding space. Our contributions highlight the importance of embeddings for NLP tasks and the effect of their quality on the final results.Comment: submitted to ICML 2013, Deep Learning for Audio, Speech and Language Processing Workshop. 8 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX